Improving Artificial Neural Network Estimates of Posterior Probabilities of Speech Sounds Doctoral Thesis Proposal

نویسندگان

  • Samuel Thomas
  • Aren Jansen
  • Mounya Elhilali
چکیده

Speech contains information of at least three sources the message that is being communicated, the speaker who is communicating and the environment. In this work we propose several approaches to improve the recognition of speech sounds that convey information about the message. We use phonemes which occur at the rate of few tens of milliseconds in the speech signal as basic units. Improvements in recognizing these units result in considerable performance gains in applications like automatic speech recognition (ASR) where the goal is to transcribe the message into text and automatic speaker verification that uses information in the speaker component to verify the the speaker’s claimed identity. We propose several approaches to improve phoneme posterior estimates from artificial neural networks. These include combination of information for multiple acoustic streams and different neural network training architectures. For speech recognition, especially in lowresource scenarios where the amount training data is limited (for example 1 hour of training), features extracted from better phoneme posteriors using the proposed techniques provide significant word recognition improvements. For speaker recognition these posteriors are used in a recently proposed neural network architecture to give considerable improvements over the earlier neural network based approaches. In future we would like to investigate the enhancement of phoneme posteriors for these applications in noisy environments. In a multistream speech recognition framework, we propose to use statistics derived from phoneme posteriors to determine the reliability of individual streams and how the streams can be effectively combined to derive better posteriors. We would also like to explore how these posteriors can be used for reliable voice activity detection in these environments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhancing Posterior Based Speech Recognition Systems

The use of local phoneme posterior probabilities has been increasingly explored for improving speech recognition systems. Hybrid hidden Markov model / artificial neural network (HMM/ANN) and Tandem are the most successful examples of such systems. In this thesis, we present a principled framework for enhancing the estimation of local posteriors, by integrating phonetic and lexical knowledge, as...

متن کامل

REMAP: recursive estimation and maximization of a posteriori probabilities in connectionist speech recognition

In this paper, we brieey describe REMAP, an approach for the training and estimation of posterior probabilities, and report its application to speech recognition. REMAP is a recursive algorithm that is reminiscent of the Expectation Maximization (EM) 5] algorithm for the estimation of data likelihoods. Although very general, the method is developed in the context of a statistical model for tran...

متن کامل

REMAP: Recursive Estimation and Maximization of A Posteriori Probabilities - Application to Transition-Based Connectionist Speech Recognition

In this paper, we introduce REMAP, an approach for the training and estimation of posterior probabilities using a recursive algorithm that is reminiscent of the EM-based Forward-Backward (Liporace 1982) algorithm for the estimation of sequence likelihoods. Although very general, the method is developed in the context of a statistical model for transition-based speech recognition using Artificia...

متن کامل

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

Estimating Posterior Probabilities In Classi...cation Problems With Neural Networks

Classi...cation problems are used to determine the group membership of multi-dimensional objects and are prevalent in every organization and discipline. Central to the classi...cation determination is the posterior probability. This paper introduces the theory and applications of the classi...cation problem, and of neural network classi...ers. Through controlled experiments with problems of kno...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011